Program, method, and device for managing system configuration

ABSTRACT

A system configuration management program that identifies what kind of resources are lacking in a managed system, based on the current operating condition of that system, and installs required resources in a timely manner. An operating condition monitor observes current load condition of a managed system that is in operation. According to the degree of increase in the system load observed in a predetermined period, a resource addition decision unit determines whether it is necessary to add hardware resources to the system, as well as what kind of hardware resources should be added, while consulting a resource addition policy dataset. If it is determined that an additional server is required, a server addition unit activates a spare server for use in the system. If it is determined that an additional storage unit is required, a storage addition unit permits the system to make access to a spare storage device.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is based upon and claims the benefits of priority fromthe prior Japanese Patent Application No. 2004-188516, filed on Jun. 25,2004, the entire contents of which are incorporated herein by reference.

BACKGROUND OF THE INVENTION

1. Field of the Invention

The present invention relates to a computer program, method, and devicefor managing configuration of a system that is formed from a pluralityof servers. More particularly, the present invention relates to a systemconfiguration management program, method, and device that can installadditional hardware resources automatically.

2. Description of the Related Art

A variety of network services are available today, which include, forexample, web-based information services and electronic mail systems. Asthe number of service users grows, server systems providing suchservices sometimes experience a sudden increase in their processing loadeven during normal operations. Service providers have to cope with theincreased access needs by raising the performance of their system asnecessary. Typically, the process of system performance enhancementinvolves addition of required resources (such as server modules andstorage devices) and subsequent reconfiguration of the entire system.

As the system grows in scale, it becomes more and more difficult toidentify what functions are lacking and how urgent the situation is. Inview of this fact, several researchers propose a technique thatautomatically changes the allocation of computer resources depending onthe load level of each user organization (see Japanese PatentApplication Publication No. 2002-24192). Some other researchers proposea distributed system in which a remote system monitors performance of astorage subsystem, and if a shortage of free space is observed, theremote system requests a relevant storage management site to installadditional storage devices (see Japanese Patent Application PublicationNo. 2003-196135, paragraphs [0036]-[0040]).

When a system shows a performance drop, it is not always easy todetermine whether the problem comes from a lack of processing power onthe server's end or an insufficient capacity on the storage system'send. Adding another storage device in an attempt to expand the freespace could impose an increased amount of storage management workload onan existing server. This means that the problem of increased system loadmay not always be solved by such a straightforward mechanism as adding anew storage device if a shortage of storage space is detected.

As can be seen from the above, one drawback of conventional approachesis that they only try to supply what appears to be lacking, withoutconsidering relationships between different kinds of resources, and thusfail to take a correct countermeasure. The result is an exhaustion ofother resources that have not been enhanced, or an inefficient resourceusage due to unnecessary addition of resources that is incurred byincorrect management decisions.

Another drawback is a lack of consideration of dissimilar behaviors ofindividual services provided. Some systems offer a plurality of servicessimultaneously, with separate hardware resources allocated to eachservice. Since those services consume their resources in different ways,the service provider faces difficulty in guaranteeing service levelagreement (SLA) of each individual service. This situation leads toincreasing demands for a improved configuration management system thatcan identify what kinds of resources are really needed and supplyappropriate resources to the managed system.

SUMMARY OF THE INVENTION

In view of the foregoing, it is an object of the present invention toprovide a system configuration management program and device that canidentify what kind of resources are lacking in a managed system, basedon the current operating condition of that system, and installadditional resources in a timely manner.

To accomplish the above object, the present invention provides a systemconfiguration management program for adding hardware resources to asystem in operation. This system configuration management program causesa computer to function as: (a) an operating condition monitor thatobserves load of the system in operation; (b) a resource additiondecision unit that determines whether it is necessary to add hardwareresources to the system, and what kind of hardware resources should beadded, according to the degree of increase in the load observed in apredetermined period, with reference to a resource addition policydataset that provides policy rules including whether to add a server ora storage unit; (c) a server addition unit that activates a spare serverif the resource addition decision unit determines that an additionalserver is required, the spare server having thus far been connected tothe system as a spare resource in a standby state; and (d) a storageaddition unit that permits the system to make access to a spare storagedevice if the resource addition decision unit determines that anadditional storage unit is required, the spare storage device havingthus far been connected to the system as a spare resource that cannot beaccessed from the system.

Additionally, to accomplish the above object, the present inventionprovides a system configuration management device for adding hardwareresources to a working system. This system configuration managementdevice comprises the following elements: (a) an operating conditionmonitor that observes load of the system in operation; (b) a resourceaddition decision unit that determines whether it is necessary to addhardware resources to the system, and what kind of hardware resourcesshould be added, according to the degree of increase in the loadobserved in a predetermined period, with reference to a resourceaddition policy dataset that provides policy rules including whether toadd a server or a storage unit; (c) a server addition unit thatactivates a spare server if the resource addition decision unitdetermines that an additional server is required, the spare serverhaving thus far been connected to the system as a spare resource in astandby state; and (d) a storage addition unit that permits the systemto make access to a spare storage device if the resource additiondecision unit determines that an additional storage unit is required,the spare storage device having thus far been connected to the system asa spare resource that cannot be accessed from the system.

The above and other objects, features and advantages of the presentinvention will become apparent from the following description when takenin conjunction with the accompanying drawings which illustrate preferredembodiments of the present invention by way of example.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a conceptual view of the present invention.

FIG. 2 shows a system configuration according to an embodiment of thepresent invention.

FIG. 3 shows an example of hardware configuration of an administrationserver used in the present embodiment of the invention.

FIG. 4 shows processing functions of an administration server.

FIG. 5 shows an example data structure of a policy dataset.

FIG. 6 shows an example data structure of a service definition dataset.

FIG. 7 shows an example of a configuration dataset.

FIG. 8 shows an example data structure of a log.

FIG. 9 is a flowchart of a process of allocating additional resources,which is executed by the administration server.

FIG. 10 shows an example of an updated configuration dataset.

FIG. 11 shows an example data structure of CPU load-based serveraddition policy.

FIG. 12 shows an example data structure of server addition policy basedon the number of accesses handled by a load balancer.

FIG. 13 shows the relationship between the access count increase ofservice A and access count increase threshold.

FIG. 14 shows an example data structure of a service flag dataset.

FIG. 15 shows an example data structure of a storage addition policydataset.

FIG. 16 is a flowchart of a resource addition process based on theactual number of service requests distributed from a load balancer.

DESCRIPTION OF THE PREFERRED EMBODIMENTS

Preferred embodiments of the present invention will be described belowwith reference to the accompanying drawings, wherein like referencenumerals refer to like elements throughout. The following descriptionbegins with an overview of the present invention and then proceeds to amore specific embodiment of the invention.

FIG. 1 is a conceptual view of the present invention. Illustrated is asystem that serves the users sitting at their terminals 1 a and 1 b, andso on via a network 1. Specifically, a working system 2 employs aplurality of servers 3 a and 3 b, as well as a storage device array 4containing a plurality of storage devices 4 a, 4 b, 4 c, and 4 d. Theservers 3 a and 3 b are connected to the network 1 via a load balancer 2a and a first switch 2 b. The storage devices 4 a to 4 d are coupled tothe servers 3 a and 3 b via a second switch 2 c.

Also connected to the working system 2 is a spare server 3 c. Morespecifically, this spare server 3 c is connected to the first and secondswitches 2 b and 2 c in the working system 2. The spare server 3 c is ina standby state and currently providing users with no particularservices.

Further connected to the working system 2 are spare storage devices 4 eand 4 f, as part of the storage device array 4. Those two spare storagedevices 4 e and 4 f are deactivated, and no access is allowed from theworking system 2.

The above system is under the control of an administration server 5. Theadministration server 5 has the following functional elements: anoperating condition monitor 5 a, a resource addition decision unit 5 b,a resource addition policy dataset 5 c, a server addition unit 5 d, anda storage addition unit 5 e. Functions of those elements are as follows:

The operating condition monitor 5 a observes current load of the workingsystem 2. The task includes, for example, monitoring current usage ofstorage devices coupled to the servers 3 a and 3 b. With reference to aresource addition policy dataset 5 c, the resource addition decisionunit 5 b determines whether it is necessary to add hardware resources tothe working system 2, as well as what kind of hardware resources shouldbe added if it is the case, according to the degree of increase in thesystem load observed by the operating condition monitor 5 a.

The resource addition policy dataset 5 c provides policy rules aboutwhether to add a server or a storage device, depending on the degree ofincrease in the load of hardware resources that is observed within apredetermined time period during system operations. More specifically,the resource addition policy dataset 5 c defines a set of rules aboutwhat resources are to be added when a certain amount of increase isobserved in the used capacity during a period that is specified asmeasurement period. In the example of FIG. 1, it gives a rule statingthat a storage device will be added if the observed increase of usedcapacity in a three-day period is equal to or greater than 10 gigabytes(GB) and smaller than 25 GB. Also stated is such a rule that a serverwill be added if the used capacity has increased by 25 GB or more in athree-day period.

If it is determined that an additional server is required, the serveraddition unit 5 d activates the spare server 3 c, which is alreadyconnected to the working system 2, but has thus far been in a standbystate. If it is determined that an additional storage device isrequired, the storage addition unit 5 e permits the working system 2 tomake access to spare storage devices 4 e and 4 f. The spare storagedevices 4 e and 4 f have been connected to the working system 2physically, but the access from the working system 2 has thus far beenblocked.

To summarize the above, the administration server operates as follows.Based on the current load condition of the working system 2 observed bythe operating condition monitor 5 a, the resource addition decision unit5 b determines whether the working system 2 needs additional hardwareresources and what kind of resources they should be, according to agiven resource addition policy dataset 5 c. When it is determined thatan additional server is required, the server addition unit 5 d activatesa spare server 3 c. When it is determined that an additional storagedevice is required, the storage addition unit Se activates spare storagedevices 4 e and 4 f, so that the working system 2 can make access tothem.

That is, the necessity of additional hardware resources is determinedaccording to the current load condition of the working system 2, and anappropriate additional resource is added promptly to the system that isin operation. With a predefined resource addition policy dataset 5 cshown in FIG. 1, for example, the administration server 5 can make anadequate choice between an additional server or an additional storagedevice, depending on how fast the used storage space is increasing.

For a system offering different classes of services, the proposedadministration server has a capability of choosing the type of hardwareresources on an individual service basis. As a specific embodiment ofthe present invention, the following section will now describe in detaila system that supplies lacking hardware resources automatically.

FIG. 2 shows a system configuration according to an embodiment of thepresent invention. The illustrated web system 200 provides variousservices to a plurality of terminals 21 and 22 via the Internet 20.While FIG. 2 shows only two terminals 21 and 22 for the purpose ofsimplicity, it should be appreciated that there are many such terminalscapable of having access to the web system 200.

The web system 200 is under the control of an administration server 100,which is linked to the web system 200 through a network 10. The websystem 200 is formed from the following components: a load balancer 210,a layer-2 (L2) switch 220, a plurality of servers 231, 232, and 233, afiber channel (FC) switch 240, and a storage device array 250 containingmultiple storage devices 251 to 256.

The load balancer 210 is connected to the Internet 20 to communicatewith the terminals 21 and 22. By monitoring processing load and otherconditions of active servers, the load balancer 210 distributes servicerequests from the terminals 21 and 22, so that the workload of requestedtasks will not concentrate on particular servers. Placed between theload balancer 210 and servers 231 to 233 is an L2 switch 220, whichforwards service requests to their destination servers as specified bythe load balancer 210.

The servers 231, 232, and 233 execute requested tasks according to eachservice request from the terminals 21 and 22, as well as returning themresponses including processing results. Basically, at least one of theplurality of servers 231, 232, and 233 is reserved as a spare server.The spare server is exempted from any service tasks, while letting otherservers deal with them. The working servers 231, 232, and 233 makeaccess to the storage device array 250 when they need to do so in orderto accomplish their tasks.

The FC switch 240, disposed between the servers 231, 232, and 233 andthe storage device array 250, receives storage access commands from theservers 231, 232, and 233 and forwards them to respective destinationstorage devices. The storage device array 250 manages a plurality ofstorage devices 251 to 256. At least one of those storage devices 251 to256 is reserved as a spare storage device, while the others are inactive use by the servers.

The administration server 100 is linked to individual components of theweb system 200 via a network 10, so that it can collect informationabout each component's operating status. When the web system 200 doesnot have enough hardware resources to provide a specific service, theadministration server 100 determines which additional hardware resource(e.g., server or storage device) should be added. The administrationserver 100 then reconfigures the web system 200 so that the newlyincorporated hardware resource will be used for the service.

End users sitting at their terminals 21 and 22 use services provided bythe web system 200. More specifically, each service is made available ina corresponding virtual local area network (VLAN) environment with aunique VLAN ID. That is, the server system is virtually divided into aplurality of network segments. Further, each service is associated witha uniform resource locator (URL) which is specific to that service. Endusers can receive a particular service by making access to theassociated URL from their terminals 21 and 22. In the case a pluralityof servers offer the same service, the load balancer 210 distributes theworkloads across those servers.

The environments for providing services are uniform in terms of physicaland logical configurations of servers, except for Internet Protocol (IP)addresses. Such a server environment can be set up by using master imagedata. All setup information (e.g., VLAN ID, IP address range, masterimage, storage capacity requirements) that is necessary for a server toprovide a particular service is defined as a set of policies and storedin the administration server 100. While being deployed as a separatesystem component independent of the web system 200, the administrationserver 100 is allowed to communicate, at the IP level, with all manageddevices in the web system 200.

The above-described system permits an additional server or storagedevice to be installed according to the need of individual services.More specifically, the administration server 100 measures the amount ofstorage space consumed by a particular service at predeterminedintervals and compares the observed increase with a service-specificthreshold, thus determining whether to add a storage device alone or toplace an extra server. If the latter option is taken, the administrationserver 100 transfers an image copy of operating system and applicationsoftware to the server to be added. The server then starts up with theoperating system and becomes ready to receive an IP address assignmentand the like from the administration server 100.

To establish a new system configuration, the administration server 100retrieves necessary data files from an existing storage device andcopies them to a new storage device. In addition, the administrationserver 100 instructs the load balancer 210 to associate the newlyinstalled server with a specified service. The administration server 100also reconfigures the FC switch 240 and other related system componentsso that a new storage device can be accessed from corresponding servers.

Hardware Platforms

FIG. 3 shows an example of a hardware platform for an administrationserver used in the present embodiment of the invention. Theadministration server 100 employs a central processing unit (CPU) 101that controls the entire computing facilities while interacting withother elements via a common bus 107, which include: a random accessmemory (RAM) 102, a hard disk drive (HDD) 103, a graphics processor 104,an input device interface 105, and a communication interface 106.

The RAM 102 serves as temporarily storage for the whole or part ofoperating system (OS) programs and application programs that the CPU 101executes. It also stores other various data objects manipulated by theCPU 101 at runtime. The HDD 103 stores program and data files of theoperating system and various applications. The graphics processor 104produces video images in accordance with drawing commands from the CPU101 and displays them on a screen of an external monitor unit 11 coupledthereto. The input device interface 105 is used to receive signals fromexternal input devices, such as a keyboard 12 and a mouse 13. Thoseinput signals are supplied to the CPU 101 via the bus 107. Thecommunication interface 106 is connected to a network 10, allowing theCPU 101 to exchange data with other computers (not shown) on the network10.

A computer with the above-described hardware configuration serves as aplatform for realizing the processing functions of the presentembodiment. While FIG. 3 only shows a hardware structure of theadministration server 100, the illustrated structure can also be appliedto other system components such as terminals 21 and 22 and servers 231,232, and 233.

Administration Server Functions

Referring next to FIG. 4, processing functions of the administrationserver 100 will be described below. In the example of FIG. 4, twoservers 231 and 232 are activated, while the remaining server 233 is ina standby state. In the storage device array 250, four storage devices251 to 256 are activated, while the remaining two storage devices 255and 256 are reserved.

The administration server 100 stores several datasets for administrativepurposes, which are: a policy dataset 111, a service definition dataset112, a configuration dataset 113, and a log file 114. The policy dataset111 defines resource management rules applicable to various situationsthat the web system 200 may encounter. The service definition dataset112 contains records about what services the web system 200 can offer,and what resources are required to provide those services. Theconfiguration dataset 113 contains records indicating currentlyavailable services and their respective resource allocations. The logfile 114 has a collection of log records showing the past operatingconditions of the web system 200.

Also included in the administration server 100 are: a load balancer (LB)manager 120, a switch manager 130, a server manager 140, an imagedistributor 150, and a storage manager 160. The administration server100 uses those components to monitor the operating condition of the websystem 200, as well as to remotely control the web system 200 when ithas to be reconfigured. In FIG. 4, the small boxes with a caption “MON”represent such monitoring functions, while those with a caption “CTL”represent control functions.

The load balancer manager 120 controls the load balancer 210 whilemonitoring its operating condition. The load balancer 210 needs suchexternal control when, for example, it begins forwarding incomingservice requests to a newly added server. The switch manager 130controls the L2 switch 220, as well as monitoring its operatingcondition. The server manager 140 controls servers 231, 232, and 233, aswell as monitoring their respective operating conditions. The servers231, 232, and 233 require such external control when, for example, a newstorage device is installed for use in offering services.

The image distributor 150 maintains multiple sets of image data 151 fordelivery of system disk image files to a requesting server. The term“image data” refers to program and data backup files for operatingsystem and applications stored in a system disk. The image distributor150 transfers an appropriate set of image data to a new server's systemdisk, which enables the server to boot up with the backed-up operatingsystem. Another function of the image distributor 150 is to help aserver to set up its operating environment with the received image data151.

The storage manager 160 controls the FC switch 240 and storage devicearray 250 while monitoring their respective operating conditions.

The administration server 100 has more components to establish a newsystem configuration, which are: a data collector 171, a resource usageanalyzer 172, a configuration controller 173. The data collector 171gathers records concerning operating conditions of the web system 200.The web system 200 is monitored in a distributed manner by the loadbalancer manager 120, switch manager 130, server manager 140, andstorage manager 16U in the administration server 100. The log file 114is where the gathered records reside. The resource usage analyzer 172retrieves data from the log file 114 for comparison with entries of thepolicy dataset 111. The resource usage analyzer 172 then determines, foreach service, whether it is necessary to add hardware resources and, ifit is, what hardware resource is suitable.

Upon determination of resources to be added for use in a particularservice, the configuration controller 173 consults a service definitiondataset 112 to find what operating environment (e.g., IP address) theservice requires. Additionally, the configuration controller 173consults a configuration dataset 113 to determine which server iscurrently responsible for the service of interest, and it formulates anew configuration for the system. The configuration controller 173 thensends commands to the load balancer manager 120, switch manager 130,server manager 140, image distributor 150, and storage manager 160.Those commands request them to control the resources for which they areresponsible, such that the new system configuration will be established.Lastly, the configuration controller 173 updates the configurationdataset 113 with the new configuration.

Administrative Datasets

With reference to some specific examples, this section gives moredetails about various administrative datasets stored in theadministration server 100. FIG. 5 shows an example data structure of apolicy dataset. This policy dataset 111 has the following data fields:“Service Name,” “Measurement Period,” “Capacity Usage IncreaseThreshold,” “Additional Resource,” and “Additional Capacity.” A recordconcerning resource addition policies is formed from such data itemsassociated in the same row.

Specifically, the “Service Name” field contains the name of a servicewhich may require additional resources. The “Measurement Period” fieldgives a measurement period during which the operating state is observedto determine the necessity of additional resources. The “Capacity UsageIncrease Threshold” field defines a threshold of storage capacity usagefor determining the need for additional resources. In the presentexample of FIG. 5, several policy rules are defined for differentamounts of capacity usage in units of gigabytes (GB). The “AdditionalResource” field specifies what type of resources is to be added when itis necessary. The “Additional Capacity” field specifies the amount ofstorage capacity to be added.

The first record of the dataset, for example, gives such a rule that a100-GB storage device should be added if an increase of 10 GB or more isobserved in storage usage of service A during a three-day period. Thesecond record means that an extra server will be required if an increaseof 25 GB or more is observed in storage usage of service A during athree-day period. The second record further states that, if this is thecase, an additional storage device with a capacity of 100 GB has to beinstalled, along with the additional server.

The policy dataset 111 of FIG. 5 actually contains a two or more recordsthat may overlap one another. For instance, the conditions of the firstand second records explained above are both satisfied when the service Ahas used another 25 GB or more in a three-day period. In such cases, therecord with a largest value of “Capacity Usage Increase Threshold” willbe applied.

FIG. 6 shows an example data structure of a service definition dataset.This service definition dataset 112 has the following data fields:“Service Name,” “URL,” “Image,” “VLAN ID,” “IP Address,” and “InitialCapacity.” A record concerning each particular service is formed fromsuch data items associated in the same row.

The “Service Name” field contains the name of a service which isavailable from the web system 200. Note that all other data items in thesame record are related to that particular service. Specifically, the“URL” field contains a URL that is assigned to that service. The “Image”field gives the name of image data to be delivered to a server when itstarts offering that service. The “VLAN ID” field contains an identifierof VLAN for that service. The “IP Address” field gives a range of IPaddresses that can be assigned to a server when it starts offering thatservice. The “Initial Capacity” field specifies a minimum capacity ofstorage devices required to make that service available.

Take the first record, for example. This record indicates first thatservice A is to be available at “http://a.” It also says that the serverproviding this service A receives “image data a.” Service A forms a VLANenvironment with VLAN ID=1 and is allowed to use an IP address selectedfrom a range of “1.1.1.1” to “1.1.1.255.” When a server starts offeringservice A, the server is assigned a storage device with a capacity of100 GB.

FIG. 7 shows an example of a configuration dataset. This configurationdataset 113 has the following data fields: “URL,” “VLAN ID,” and “PortNumber,” “Image,” “IP Address,” “Storage Device ID,” and “Capacity.”

The “URL” field contains a URL of a service that is currently available.The “VLAN ID” field shows the identifier of a VLAN to which the serverproviding the service belongs. The “Port Number” field gives a portnumber of the L2 switch 220 to which the service-providing server islinked. The “Image” field contains the name of image data that wasdelivered to the service-providing server. The “IP Address” field showsthe IP address currently used by the service-providing server. The“Storage Device ID” field stores the identification code of a storagedevice that is used to offer the service. The “Capacity” field shows theamount of storage capacity assigned to the services.

The configuration dataset 113 shows the current setup of the web system200. For example, the service available at URL “http://a” is currentlyprovided by a server working in a VLAN environment with VLAN ID=1. Thatserver has an IP address of “1.1.1.1,” and it received “image data a”when it started up. The service is allowed to use a storage capacity ofup to 100 GB, which is available in the storage device with anidentification code of “1.”

FIG. 8 shows an example data structure of a log file. This log file 114has the following data fields: “IP Address,” “Service Name,” “DataAcquisition Time,” and “Usage.” A record concerning load condition isformed from such data items associated in the same row. Specifically,the “IP Address” field shows the IP address of a server which sent outdata in providing a particular service, and the “Service Name” fieldshows the name of that service. The “Data Acquisition Time” fieldcontains a timestamp indicating when this log record was created. The“Usage” field indicates the usage level of a storage device assigned tothe service.

Administration Server Process Flow

With the functions and data described in the preceding sections, theadministration server 100 operates as shown in the flowchart of FIG. 9.Specifically, FIG. 9 shows how the administration server 100 allocatesadditional resources. This process includes the following steps:

-   (Step S11) The data collector 171 consults the server manager 140 to    collect data about how much capacity in the storage device array 250    is actually used by each individual service.    -   More specifically, the server manager 140 sends a message        requesting the working servers 231 and 232 to report their        actual storage capacity usage for each service that they        provide. In response to this request, each server 231 and 232        sends information about its service-specific storage usage back        to the server manager 140. The received information is passed to        the data collector 171. The data collector 171 collects those        data records about service-specific storage usage and writing        them into a log file 114.    -   Every service has its own directory in the file system, and the        amount of data stored under that directory can be added up. In        this step S11, the servers 231 and 232 make such calculation for        each service and report the result to the server manager 140.-   (Step S12) The resource usage analyzer 172 selects one service from    among those provided by the web system 200.-   (Step S13) By consulting the log file 114, the resource usage    analyzer 172 evaluates the increase in capacity usage during a given    measurement period for the selected service.    -   More specifically, the resource usage analyzer 172 first        extracts necessary log records out of the log file 114, while        skipping those unrelated to the service selected at step S12.        What is extracted here is a latest set of records and an old set        of records that were collected in the preceding measurement        cycle (the measurement interval is specified in the “Measurement        Period” field of the policy dataset 111. Now that data is ready,        the resource usage analyzer 172 compares the latest records with        the past records, thereby identifying the amount of increase in        storage capacity usage.-   (Step S14) The resource usage analyzer 172 determines whether it is    necessary to allocate an extra server to the selected service.    -   More specifically, the resource usage analyzer 172 compares the        observed increase in storage capacity usage of the selected        service with each capacity usage increase threshold defined in        the policy dataset 111. If the increase equals or exceeds one        such threshold, the resource usage analyzer 172 looks into the        corresponding “Additional Resource” field of the policy dataset        111, thus determining what is required in the present situation.        If this data field indicates the need for an additional server,        the resource usage analyzer 172 advances to step S15. If not,        the process branches to step S20.-   (Step S15) Now that the need for a server is established, the    configuration controller 173 begins actual tasks of allocating a    server to the selected service.    -   More specifically, the configuration controller 173 consults the        service definition dataset 112 to determine which image data,        VLAN ID, and IP address should be sent to the new server. The        configuration controller 173 then issues a data delivery request        to the image distributor 150, specifying which image data to        send. This request causes the image distributor 150 to transmit        specified image data files to a reserved server. The received        image data is loaded into the server's local system disk where        the operating system is supposed to be stored, so that the        server can execute OS and other functions on the basis of the        received image data. The image distributor 150 further        establishes an operating environment for the server by using        VLAN ID and IP address determined by the configuration        controller 173.-   (Step S16) The configuration controller 173 configures the storage    device system.    -   More specifically, the configuration controller 173 checks the        “Additional Capacity” field of the policy dataset 111, thus        determining what resources should be added. The configuration        controller 173 requests the storage manager 160 to allocate a        specified amount of additional storage space to the service. In        response to this request, the storage manager 160 reserves the        requested amount of space in the storage device array 250. The        storage manager 160 also reconfigures the FC switch 240, so that        the reserved space can be available to the service selected at        step S12.-   (Step S17) The configuration controller 173 moves service-related    data. More specifically, the configuration controller 173 commands    the storage manager 160 to move the data pertaining to the selected    service to the newly reserved storage space. The storage manager 160    controls the storage device array 250 to transport necessary data    according to that command.-   (Step S18) The configuration controller 173 changes VLAN setup. More    specifically, the configuration controller 173 instructs the server    manager 140 to register the new server as a member node of the VLAN    determined at step S15. The server manager 140 sets up the server    environment accordingly.-   (Step S19) The configuration controller 173 changes setup of the    load balancer 210. More specifically, the configuration controller    173 requests the load balancer manager 120 to make the load balancer    210 recognize the new server as one of the destinations of incoming    requests for the selected service. The load balancer manager 120    reconfigures the load balancer 210 accordingly. Upon completion or    this setup, the configuration controller 173 updates the    configuration dataset 113 to include a new entry of service    configuration. The process then proceeds to step S22.-   (Step S20) Since the test at step S14 indicates no need for    additional servers, the resource usage analyzer 172 now determines    whether the selected service requires more storage space.    -   More specifically, the resource usage analyzer 172 compares the        observed increase in storage capacity usage of the selected        service with each relevant capacity usage increase threshold        defined in the policy dataset 111. If the increase equals or        exceeds one such threshold, the resource usage analyzer 172        looks into the corresponding “Additional Resource” field of the        policy dataset 111 to determine what is required in the present        situation. If this data field indicates the need for an        additional storage device, the resource usage analyzer 172        advances to step S21. If not, the process skips to step S22.-   (Step S21) The configuration controller 173 adds a new storage    device.    -   More specifically, the configuration controller 173 determines        how much additional storage capacity is required, by consulting        the “Additional Capacity” field of the policy dataset 111. The        configuration controller 173 requests the storage manager 160 to        allocate the specified amount of additional storage space to the        service. In response to this request, the storage manager 160        reserves the requested amount of space in the storage device        array 250. The storage manager 160 also reconfigures the FC        switch 240, so that the reserved space can be available to the        service selected at step S12. The configuration controller 173        then updates the configuration dataset 113 with the new        configuration.-   (Step S22) The resource usage analyzer 172 determines whether there    is any unchecked service. If there is, the process returns to step    S12. If all existing services have been checked, the process    advances to step S23.-   (Step S23) The data collector 171 determines whether there is an    instruction to stop this additional resource allocation process. If    there is, the process is terminated. If not, the process returns the    step S11 to repeat the above steps.

The content of the configuration dataset 113 is kept up to date, sinceeach time a new resource is added for a particular service, theconfiguration controller 173 updates the configuration dataset 113 witha new configuration. See, for example, the second record of the servicedefinition dataset 112 (FIG. 6) and configuration dataset 113 (FIG. 7).This service B is made available at “http://b” by the server with an IPaddress “1.1.2.1.” Suppose that the service B has consumed another 35 GBor more space in two days. The administration server 100 now has toallocate an extra server and an additional storage device of 200 GB, asdefined in the policy dataset 111 (FIG. 5). The configuration controller173 thus reconfigures the web system 200. The new system configurationis then registered with the configuration dataset 113.

FIG. 10 shows the updated configuration dataset. Compared with theprevious configuration dataset 113 (FIG. 7), the updated version has anew record that says: URL=“http://b,” VLAN ID=2, Port number=4,Image=“Image B,” IP address=1.1.2.2, Storage Device ID=3, andCapacity=400 GB. This record describes current resource allocation forthe newly added server.

Notice that service B is now supported by two servers. When two or moreservers work for the same service, as in the case of service B, thoseservers share the same set of data for that service. For this reason,the second record in the configuration dataset 113 of FIG. 10 has alsobeen changed as a result of installation of a new server. Specifically,the existing server (the one with an IP address of “1.1.2.1”) is nowusing a storage device “3” with a capacity of 400 GB, which were “1” and200 GB, respectively, before a server was added.

In the way described above, the administration server 100 findsresources that the web system 200 is lacking and allocates what isrequired (i.e., server or storage device or both) to the service inneed.

Resource Management Based on CPU Load

According to another aspect of the invention, the administration server100 may be designed to monitor the CPU load of working servers anddetermine whether or not to add another server. In this case, theadministration server 100 employs a new policy dataset, called “serveraddition policy dataset.”

FIG. 11 shows an example data structure of CPU load-based serveraddition policy. The illustrated server addition policy dataset 111 ahas the following data fields: “Service Name,” “Sampling Interval,”“Additional CPUs,” and “CPU Activity Ratio Threshold.” A recordconcerning server addition policy is formed from such data itemsassociated in the same row.

The “Service Name” field contains the name of a service. The “SamplingInterval” field specifies a sampling interval and the number of timesthat CPU activity ratio should be monitored at the specified intervals.The “Additional CPUs” field gives the number of CPUs to be added whenthe observed CPU activity ratio is greater than a threshold specified inthe next data field titled “CPU Activity Ratio Threshold.” Note that thedetermination uses an average of CPU activity ratios observed during thespecified sampling interval.

According to the first record, for example, the CPU activity ratio inservice A is to be sampled three times at the intervals of ten seconds.If the average result is 70% or more, one additional CPU will beallocated to service A.

The server addition policy dataset 111 a of FIG. 11 actually contains atwo or more records that may overlap one another. For instance, theconditions of the first and second records are both satisfied when theCPU activity ratio of service A has reached 85%. In such cases, therecord with a largest value in “CPU Activity Ratio Threshold” field willbe applied.

With the above-described server addition policy dataset 111 a, theadministration server 100 operates as follows: First of all, the datacollector 171 collects data about CPU activity ratios in each server atgiven sampling intervals. The data collector 171 performs this task incooperation with the server manager 140. Subsequently, the resourceusage analyzer 172 calculates an average CPU activity ratio from apredetermined number of samples and compares the result with a giventhreshold. It determines that the system needs server enhancement, ifthe resulting average CPU activity ratio is not smaller than thethreshold.

In adding a server to the system, the configuration controller 173selects an appropriate server with a performance equivalent to what isspecified in the “Additional CPU” field. The configuration controller173 commands the image distributor 150 to send image data to the spareserver. Upon startup of the server, the image distributor 150 helps theserver set up its IP address. The storage manager 160, on the otherhand, reserves a required amount of storage space in the storage devicearray 250, and it configures the FC switch 240 such that the new servercan make access to that reserved storage device. The switch manager 130is responsible for setting up the L2 switch 220 such that the new servercan enroll as a member node of a specified VLAN. The configurationcontroller 173 also instructs the load balancer 210 to associate a newlyinstalled server with a specified service. In this way, theadministration server 100 of the present embodiment adds a server andother resources to the managed system according to the observed CPUactivity ratios.

Resource Management based on Load Balancing

According to yet another aspect of the present invention, theadministration server 100 may be designed to determine the necessity ofadditional resources, based on how many service requests the loadbalancer 210 is handling. At predetermined sampling intervals, the loadbalancer manager 120 records the access count of each service that ithandles. Based on this access count, the resource usage analyzer 172determines whether it is necessary to add a server or a storage deviceor both. In the following example, a new set of policy definitions areused in place of the policy dataset 111 described earlier in FIG. 4.They are: a server addition policy dataset (FIG. 12), a service flagdataset (FIG. 14), and a storage addition policy dataset (FIG. 15).

FIG. 12 shows an example data structure of server addition policiesbased on the number of accesses handled by a load balancer. Theillustrated server addition policy dataset 111 b has the following datafields: “Service Name,” “Access Count Increase Threshold,” “SamplingInterval,” “Additional Servers,” and “Threshold Increment.” A recordconcerning server addition policy is formed from such data itemsassociated in the same row.

The “Service Name” field contains the name of a service. The “AccessCount Increase Threshold” field contains a threshold value for accesscount increase, which is used to determine whether or not to install anadditional server. Note that this is a variable threshold, whichincreases each time a new server is added, as will be described later.The “Sampling Interval” field specifies at what intervals a new accesscount should be collected. The “Additional Servers” field gives thenumber of servers to be added when the observed access count increase isgreater than the variable threshold. The “Threshold Increment” fieldspecifies an increment of the access count increase threshold, whichapplies each time a new server is added.

Suppose, for example, that the current access count increase thresholdis set to 3,000 for service A. The load balancer 210 reports atpredetermined intervals how many access requests for service A it hasrouted to corresponding servers. If a new access count exceeds theprevious count by 3,000 or more, the administration server 100 allocatesanother server to service A, and at the same time, the threshold isincreased by an increment of 3,000.

FIG. 13 shows the relationship between the access count increase ofservice A and access count increase threshold. The horizontal axisrepresents access count, and the vertical axis represents access countincrease threshold. As can be seen from FIG. 13, the access countincrease threshold stays at a constant level of 3,000 until the actualaccess count increase reaches 3,000. If the access count reaches thefirst threshold of 3,000, then the threshold goes up to the next level,6,000, with an increment of 3,000. Likewise, the access count increasethreshold is raised by 3,000 each time the actual increase reaches it.

FIG. 14 shows an example data structure of a service flag dataset. Thisservice flag dataset 111 c has three data fields titled “Service Name,”“URL,” and “Updateable Service Flag.” Those data items in each row forman associated set of information, or a record, that indicates whetherthe service in question is an updateable data service.

The “Service Name” field contains the name of a service, and the “URL”field indicates the URL of that service. The “Updateable Service Flag”field contains a flag showing whether the service in question is anupdateable data service, i.e., whether the service allows users toupdate its data. For example, electronic mail services fall under thecategory of updateable data services since they provide users with astorage space for storing received messages. Non-updateable dataservices, on the other hand, include simple information providingservices that only allow the users to browse their web pages.

Updateable data services tend to consume more and more storage space asthe number of accesses increases. For this reason, an increase in theaccess count calls for consideration of additional storage devices, aswell as of server enhancement. In contrast, non-updateable data serviceare free from worries about storage space consumption, no matter howmuch the access count increases. Increased access to a non-updateabledata service may actually necessitate server addition, but it will notrequire extra storage devices. The service flag dataset of FIG. 14 makesdistinctions between updateable and non-updateable data services bysetting a corresponding updateable service flag to “1” for the former,and “0” for the latter.

FIG. 15 shows an example data structure of a storage addition policydataset. This storage addition policy dataset 111 d has the followingdata fields: “Service Name,” “Access Count Increase Threshold,”“Sampling Interval,” “Additional Storage Size,” and “ThresholdIncrement.” A record concerning storage addition policy is formed fromsuch data items associated in the same row.

The “Service Name” field contains the name of a service. The “AccessCount Increase Threshold” field contains a threshold value for theaccess count increase, which is used to determine whether or not toinstall an additional storage device. Note that it is a variablethreshold, which increases each time a new storage device is added. The“Sampling Interval” field specifies at what intervals a new access countshould be collected. The “Additional Storage Size” field gives a storagecapacity to be added when the observed access count increase exceeds thevariable threshold. The “Threshold Increment” field specifies anincrement of the access count increase threshold, which applies eachtime a new storage device is added.

Suppose, for example, that the current access count increase thresholdis set to 5,000 for service A. At predetermined intervals, the loadbalancer 210 reports how many access requests for service A it hasrouted to corresponding servers during each given period. If a newaccess count exceeds the previous count by 5,000 or more, theadministration server 100 allocates an additional 300-GB storage deviceto service A, and at the same time, the threshold is increased by anincrement of 5,000.

FIG. 16 is a flowchart of a resource addition process based on theactual number of service requests distributed from a load balancer. Thisprocess includes the following steps:

-   (Step S31) The data collector 171 selects a service whose sampling    interval has expired. (The sampling period is specified in the    server addition policy dataset 111 b.)-   (Step S32) The data collector 171 communicates with the load    balancer manager 120 to collect a new access count of the service    selected at step S31.    -   More specifically, the server manager 140 sends a message        requesting the load balancer 210 to report a new access count of        the service in question. In response to this request message,        the load balancer 210 sends back the number of accesses made to        the specified service during a predetermined period. The load        balancer manager 120 passes the received data to the data        collector 171. The data collector 171 records this service        access count in a log file 114.-   (Step S33) By consulting the log file 114, the resource usage    analyzer 172 evaluates the increase in access count of the selected    service.    -   More specifically, the resource usage analyzer 172 first        extracts necessary log records out of the log file 114, while        skipping those unrelated to the service selected at step S31.        What is extracted here is a latest record and an old record that        was collected in the preceding sampling cycle. The resource        usage analyzer 172 then compares the two records, thereby        identifying the amount of increase in access count.-   (Step S34) The resource usage analyzer 172 compares the observed    access count increase with each access count increase threshold    defined in the server addition policy dataset 111 b. If the observed    increase equals or exceeds one such threshold, the resource usage    analyzer 172 determines that a server has to be added, thus    advancing the process to step S35. If the observed increase is still    below the threshold, the process skips to step S37.-   (Step S35) Now that the need for a server is established, the    configuration controller 173 begins actual tasks of allocating a    server to the selected service.-   (Step S36) The configuration controller 173 updates the “Access    Count Increase Threshold” field of the server addition policy    dataset 111 b.-   (Step S37) The resource usage analyzer 172 consults the service flag    dataset 111 c to determine whether the service selected at step S31    has an updateable service flag being set to one. That is, it is    tested whether the selected service is an updateable data service or    a non-updateable data service. If the flag is set to one, the    process advances to step S38. If it is zero, the process skips to    step S41.-   (Step S38) The resource usage analyzer 172 compares the observed    access count increase with each access count increase threshold    defined in the storage addition policy dataset 111 d. If the    observed increase equals or exceeds one such threshold, the resource    usage analyzer 172 determines that a storage device has to be added,    thus advancing the process to step S39. If the observed increase is    still below the threshold, the process skips to step S41.-   (Step S39) The configuration controller 173 adds a storage device    according to the storage addition policy data 111 d.-   (Step S40) The configuration controller 173 updates the “Access    Count Increase Threshold” field of the storage addition policy    dataset 111 d.-   (Step S41) The data collector 171 determines whether there is an    instruction to stop this additional resource allocation process. If    there is, the process is terminated. If there is, the process    returns to step S31.

The above sequence of processing steps permits the administration server100 to select an appropriate resource based on a service-specific accesscount. The proposed process also checks whether the service in questionis an updateable data service before allocating a storage device, thuspreventing unnecessary addition from happening.

Program Storage Media

The above-described processing mechanisms of the present invention areimplemented on a computer system. The functions necessary for realizinga data management device are encoded and provided in the form ofcomputer programs. The computer system executes such programs to providethe intended functions of the present invention. For the purpose ofstorage and distribution, the programs are stored in a computer-readablestorage medium, which include: magnetic storage media, optical discs,magneto-optical storage media, and solid state memory devices. Magneticstorage media include hard disk drives (HDD), flexible disks (FD), andmagnetic tapes. Optical discs include digital versatile discs (DVD),DVD-RAM, compact disc read-only memory (CD-ROM), CD-Recordable (CD-R),and CD-Rewritable (CD-RW). Magneto-optical storage media includemagneto-optical discs (MO).

Portable storage media, such as DVD and CD-ROM, are suitable for thedistribution of program products. Network-based distribution of softwareprograms is also possible, in which master program files are madeavailable in a server computer for downloading to user computers via anetwork.

A user computer stores necessary programs in its local storage unit,which have previously been installed from a portable storage media ordownloaded from a server computer. The computer performs intendedfunctions by executing the programs read out of the local storage unit.As an alternative way of program execution, the computer may executeprograms, reading out program files directly from a portable storagemedium. Another alternative method is that the user computer dynamicallydownloads programs from a server computer when they are demanded andexecutes them upon delivery.

CONCLUSION

The present invention proposes a mechanism of determining what kind ofhardware resources are lacking in the managed system, depending on thedegree of increase in the system load. This feature makes it possible toselect and allocate an appropriate resource to the system.

The foregoing is considered as illustrative only of the principles ofthe present invention. Further, since numerous modifications and changeswill readily occur to those skilled in the art, it is not desired tolimit the invention to the exact construction and applications shown anddescribed, and accordingly, all suitable modifications and equivalentsmay be regarded as falling within the scope of the invention in theappended claims and their equivalents.

1. A system configuration management program for adding hardwareresources to a system in operation, the program causing a computer tofunction as: an operating condition monitor that observes load of thesystem in operation; a resource addition decision unit that determineswhether it is necessary to add hardware resources to the system, andwhat kind of hardware resources should be added, according to the degreeof increase in the load observed in a predetermined period, withreference to a resource addition policy dataset that provides policyrules including whether to add a server or a storage unit; a serveraddition unit that activates a spare server if the resource additiondecision unit determines that an additional server is required, thespare server having thus far been connected to the system as a spareresource in a standby state; and a storage addition unit that permitsthe system to make access to a spare storage device if the resourceaddition decision unit determines that an additional thus far beenconnected to the system as a spare resource that cannot be accessed fromthe system.
 2. The system configuration management program according toclaim 1, wherein: the system offers a plurality of services; theoperating condition monitor observes load of each individual service;the resource addition decision unit determines whether it is necessaryto add hardware resources to the system and what kind of hardwareresources should be added, for each individual service; the serveraddition unit causes the spare server to start offering one of theservices that the resource addition decision unit has determined asbeing in need of an additional server; and the storage addition unitenables access to the spare storage device from one of the services thatthe resource addition decision unit has determined as being in need ofan additional storage device.
 3. The system configuration managementprogram according to claim 1, wherein: the operating condition monitorobserves storage capacity usage in the system; and the resource additiondecision unit determines whether it is necessary to add hardwareresources to the system, and what kind of hardware resources should beadded, according to the degree of increase in the storage capacity usageobserved in the predetermined period, with reference to the resourceaddition policy dataset.
 4. The system configuration management programaccording to claim 1, wherein: the operating condition monitor observesthe number of accesses to the system; and the resource addition decisionunit determines whether it is necessary to add hardware resources to thesystem, and what kind of hardware resources should be added, accordingto the degree of increase in the number of accesses observed in thepredetermined period, with reference to the resource addition policydataset.
 5. The system configuration management program according toclaim 4, wherein the resource addition decision unit activates theserver addition unit and/or storage addition unit, if the observedincrease in the number of accesses equals or exceeds an access countincrease threshold that is defined in the resource addition policydataset.
 6. The system configuration management program according toclaim 5, wherein the resource addition decision unit increases theaccess count increase threshold by a predetermined increment each time anew hardware resource is added to the system.
 7. A system configurationmanagement method for adding hardware resources to a system inoperation, the method comprising the steps of: (a) observing load of thesystem in operation; (b) determining whether it is necessary to addhardware resources to the system, and what kind of hardware resourcesshould be added, according to the degree of increase in the loadobserved in a predetermined period, with reference to a resourceaddition policy dataset that provides policy rules including whether toadd a server or a storage unit; (c) activating a spare server if it isdetermined at said step (b) that an additional server is required, thespare server having thus far been connected to the system as a spareresource in a standby state; and (d) permitting the system to makeaccess to a spare storage device if it is determined at said step (b)that an additional storage unit is required, the spare storage devicehaving thus far been connected to the system as a spare resource thatcannot be accessed from the system.
 8. A system configuration managementdevice for adding hardware resources to a system in operation,comprising: an operating condition monitor that observes load of thesystem in operation; a resource addition decision unit that determineswhether it is necessary to add hardware resources to the system, andwhat kind of hardware resources should be added, according to the degreeof increase in the load observed in a predetermined period, withreference to a resource addition policy dataset that provides policyrules including whether to add a server or a storage unit; a serveraddition unit that activates a spare server if the resource additiondecision unit determines that an additional server is required, thespare server having thus far been connected to the system as a spareresource in a standby state; and a storage addition unit that permitsthe system to make access to a spare storage device if the resourceaddition decision unit determines that an additional storage unit isrequired, the spare storage device having thus far been connected to thesystem as a spare resource that cannot be accessed from the system.
 9. Acomputer-readable storage medium storing a system configurationmanagement program for adding hardware resources to a system inoperation, the program causing a computer to function as: an operatingcondition monitor that observes load of the system in operation; aresource addition decision unit that determines whether it is necessaryto add hardware resources to the system, and what kind of hardwareresources should be added, according to the degree of increase in theload observed in a predetermined period, with reference to a resourceaddition policy dataset that provides policy rules including whether toadd a server or a storage unit; a server addition unit that activates aspare server if the resource addition decision unit determines that anadditional server is required, the spare server having thus far beenconnected to the system as a spare resource in a standby state; and astorage addition unit that permits the system to make access to a sparestorage device if the resource addition decision unit determines that anadditional storage unit is required, the spare storage device havingthus far been connected to the system as a spare resource that cannot beaccessed from the system.